1. Identify Lines using HITRAN Manually

The key tasks in this tutorial is to:

  • 4.1 | Load the baseline corrected spectrum

  • 4.2 | Load the HITRAN linelist for that molecule

  • 4.3 | Find the peaks manually using Bokeh, click and print

  • 4.4 | Match the HITRAN lines to the peaks

  • 4.5 | Report, plot, and save the results

[1]:
# Import necessary modules
from Xpectra.SpecFitAnalyzer import SpecFitAnalyzer
from Xpectra.LineAssigner import *
from Xpectra.SpecStatVisualizer import plot_fitted_als_bokeh, plot_spectra_errorbar_bokeh

4.1 Load the original and baseline-corrected spectra

so here we want to point out to the baseline corrected spectra from notebook 2

\(\rightarrow\) In step 2, we corrected the spectral baseline and saved it as a CSV file in the processed_data directory. Here we load that data by converting to a DataFrame:

[2]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")

# Import baseline corrected spectrum
corrected_spectrum = pd.read_csv(os.path.join(__reference_data_path__,'processed_data','arpls_baseline_corrected_methane_spectrum.csv'))

# Assign wavenumber (x) and signal (y) arrays
x = corrected_spectrum['original_x'].dropna().to_numpy()
y = corrected_spectrum['original_y'].dropna().to_numpy()

x_baseline_corr = corrected_spectrum['baseline_corrected_x'].dropna().to_numpy()
y_baseline_corr = corrected_spectrum['baseline_corrected_y'].dropna().to_numpy()

Visualize both of them togather

\(\rightarrow\) Visualize the imported spectra:

[3]:
# Obtain previously fitted baseline by reverse correcting the spectrum
spectral_baseline = y - y_baseline_corr

plot_fitted_als_bokeh(wavenumber_values = x,
                      signal_values = y,
                      fitted_baseline = spectral_baseline,
                      baseline_type = 'arpls'
                     )
Loading BokehJS ...

4.2 Load HITRAN linelist and parse them

\(\rightarrow\) The next step is to upload the HITRAN line list to a DataFrame. For this, we use the LineAssigner module, instantiating it with the baseline-corrected spectrum and HITRAN file path.

[4]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")

# Define path to HITRAN data
input_file = os.path.join(__reference_data_path__, 'datasets','CH4_nu3.par')

# Initialize LineAssigner
assign = LineAssigner(wavenumber_values = x_baseline_corr,
                      signal_values = y_baseline_corr,
                      hitran_file = input_file,
                      absorber_name= 'CH4')

\(\rightarrow\) With the class initialized, we now parse the line list to a DataFrame. The default columns converted to the DataFrame are: ‘local_iso_id’, ‘nu’, ‘sw’, ‘gamma_air’, ‘local_upper_quanta’, and ‘ierr’.

\(\rightarrow\) This function automatically seperates terms from local quanta into J quantum number, N quantum number, and symmetry.

[5]:
# Parse file to DataFrame
assign.parse_file_to_dataframe()
[5]:
molec_id local_iso_id nu sw a gamma_air gamma_self elower n_air delta_air ... iref line_mixing_flag gp gpp J_low sym_low N_low J_up sym_up N_up
0 6 2 2900.000621 1.825000e-25 0.023890 0.0490 0.067 814.6845 0.63 -0.005800 ... 64 3 3253433.0 None 12 A1 1 13 A2 9
1 6 2 2900.005693 6.307000e-27 0.005030 0.0470 0.065 1096.0334 0.62 -0.005800 ... 64 3 3253433.0 None 14 F2 3 14 F1 40
2 6 2 2900.022027 3.048000e-27 0.022620 0.0460 0.060 1593.6378 0.61 -0.005800 ... 64 3 3253433.0 None 17 F2 2 17 F1 47
3 6 1 2900.027223 1.891000e-25 0.000465 0.0480 0.067 815.1315 0.63 -0.005800 ... 34 3 3245363.0 None 12 F1 3 13 F2 21
4 6 2 2900.035027 1.905000e-25 0.067460 0.0400 0.067 815.0317 0.63 -0.005800 ... 64 3 3253433.0 None 12 E 2 12 E 25
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41623 6 2 3299.877822 1.652000e-29 0.000353 0.0450 0.059 1780.0695 0.60 -0.006500 ... 54 3 3253433.0 None 18 F2 3 19 F1 85
41624 6 1 3299.900527 5.946000e-29 0.000004 0.0380 0.061 1416.5543 0.61 -0.006500 ... 34 3 3240363.0 None 16 E 1 17 E 52
41625 6 3 3299.901848 7.204000e-29 0.000221 0.0589 0.077 532.9581 0.75 -0.006346 ... 44 4 2243323.0 None 11 E 4 11 E 2
41626 6 1 3299.984795 2.838000e-25 0.035670 0.0470 0.099 1526.2146 0.75 -0.006600 ... 32 3 3333232.0 None 6 A2 1 6 A1 31
41627 6 2 3299.989099 5.343000e-29 0.000730 0.0380 0.060 1594.0043 0.61 -0.006500 ... 54 3 3253433.0 None 17 E 2 18 E 54

41628 rows × 25 columns

\(\rightarrow\) The HITRAN Dataframe is now accessible through class attribute hitran_df

[6]:
# Display header and first 3 rows
assign.hitran_df.head(3)
[6]:
molec_id local_iso_id nu sw a gamma_air gamma_self elower n_air delta_air ... iref line_mixing_flag gp gpp J_low sym_low N_low J_up sym_up N_up
0 6 2 2900.000621 1.825000e-25 0.02389 0.049 0.067 814.6845 0.63 -0.0058 ... 64 3 3253433.0 None 12 A1 1 13 A2 9
1 6 2 2900.005693 6.307000e-27 0.00503 0.047 0.065 1096.0334 0.62 -0.0058 ... 64 3 3253433.0 None 14 F2 3 14 F1 40
2 6 2 2900.022027 3.048000e-27 0.02262 0.046 0.060 1593.6378 0.61 -0.0058 ... 64 3 3253433.0 None 17 F2 2 17 F1 47

3 rows × 25 columns

4.3 Identify peaks manually

short description of what package you are using and what you expect to get here

\(\rightarrow\) We move on to identifying the location (in wavenumber) of each peak in our methane spectrum. To accomplish this, we use SpecFitAnalyzer.

4.3.1 Select wavelength range for analysis

\(\rightarrow\) Many times, we are only interested in a certain part of the spectrum, or the entire spectrum has too many peaks to process all at once. We select a range of wavenumbers for our analysis:

[7]:
wavenumber_range = (2911.15, 2911.9) # cm^-1

\(\rightarrow\) Lets visualize the spectrum within this wavenumber range:

[8]:
plot_spectra_errorbar_bokeh(wavenumber_values = x_baseline_corr,
                            signal_values = y_baseline_corr,
                            wavenumber_range = wavenumber_range,
                            absorber_name = 'CH4',
                            plot_type = 'line')
Loading BokehJS ...

4.3.2 Find the peaks

\(\rightarrow\) Manually find spectral peaks

[9]:
assign.line_finder_manual(wavenumber_range=wavenumber_range)

\(\rightarrow\) Paste peak coordinates into list, and define peak centers

[10]:
guesses_list = [[2911.187, 0.504], [2911.262, 0.594], [2911.287, 0.403],
                [2911.350, 0.545], [2911.402, 0.450], [2911.518, 0.160],
                [2911.623, 0.549], [2911.676, 0.100], [2911.698, 0.195]]

initial_guesses = np.array(guesses_list)

peak_centers = initial_guesses[:,0]
peak_heights = initial_guesses[:,1]

\(\rightarrow\) Update class instance

[11]:
assign.peak_centers_manual = peak_centers

4.4 Identify the line

\(\rightarrow\) Compare peaks with known lines

\(\rightarrow\) Find the closest line from HITRAN line list for each peak in the lab spectrum

[12]:
# Filters HITRAN line list
filters = {'local_iso_id' : [1,2]} # Only search for common isotopologue


# Match found lines, plot them over spectrum, and display DataFrame
assign.hitran_line_assigner(threshold = 0.02,
                            filters = filters,
                            columns_to_print = ['local_iso_id', 'J_up','nu','peak_center'], # Print over each line
                            wavenumber_range = wavenumber_range,
                            __print__ = True, # Display the fitted HITRAN DataFrame
                            __plot_bokeh__ = True, # Plot interactively with Bokeh
                            __plot_seaborn__ = True
                           )
Loading BokehJS ...
../../../_images/_build2_html_tutorials_4_Identify_Peaks_Manually_HITRAN_34_4.png
molec_id local_iso_id nu sw a gamma_air gamma_self elower n_air delta_air ... line_mixing_flag gp gpp J_low sym_low N_low J_up sym_up N_up peak_center
0 6 1 2911.186061 5.284000e-23 0.057940 0.0576 0.070 575.2596 0.67 -0.007580 ... 3 4345363.0 None 10 F1 2 9 F2 35 2911.187
1 6 1 2911.261561 6.751000e-23 0.074010 0.0572 0.070 575.1841 0.67 -0.008480 ... 3 4345363.0 None 10 F1 1 9 F2 35 2911.262
2 6 1 2911.285780 3.903000e-23 0.042810 0.0576 0.070 575.2852 0.67 -0.007600 ... 3 4345363.0 None 10 F2 3 9 F1 36 2911.287
3 6 2 2911.348367 5.866000e-23 0.602700 0.0618 0.085 104.7777 0.75 -0.002122 ... 3 3335212.0 None 4 A1 1 5 A2 6 2911.350
4 6 1 2911.401080 4.331000e-23 0.047480 0.0573 0.070 575.1699 0.67 -0.008890 ... 3 4345363.0 None 10 F2 2 9 F1 36 2911.402
5 6 1 2911.518480 1.271000e-23 0.013930 0.0583 0.070 575.0525 0.67 -0.008430 ... 3 4345363.0 None 10 F2 1 9 F1 36 2911.518
6 6 1 2911.622555 5.719000e-23 0.037600 0.0587 0.070 575.0555 0.67 -0.008330 ... 3 4345363.0 None 10 A2 1 9 A1 11 2911.623
7 6 1 2911.674563 7.653000e-24 3.939000 0.0390 0.062 1817.8431 0.75 -0.005823 ... 3 3333232.0 None 9 F2 7 8 F1 75 2911.676
8 6 2 2911.697399 3.172000e-29 0.000225 0.0450 0.060 1594.1021 0.61 -0.005800 ... 3 3253433.0 None 17 F1 4 18 F2 27 2911.698

9 rows × 26 columns

[ ]:

3.4 Save the results: Plots, dfs

\(\rightarrow\) Use plot saving functionality

[13]:
assign.hitran_line_assigner(threshold = 0.02,
                            filters = filters,
                            columns_to_print = ['nu','peak_center'],
                            wavenumber_range = wavenumber_range,
                            __save_plot__ = True, # Save the plot (seaborn version)
                           __reference_data__ = __reference_data_path__)
<Figure size 7000x4200 with 0 Axes>
[14]:
# Add peak_heights
assign.fitted_hitran['peak_heights'] = peak_heights

\(\rightarrow\) Save fitted HITRAN DataFrame to CSV file

[15]:
df = assign.fitted_hitran

# Define file name
file_name = "closest_hitran_lines_manual.csv"

# Save DataFrame to CSV
df.to_csv(os.path.join(__reference_data_path__,'processed_data',file_name), index=False)
[16]:
# Next steps:
[17]:
# header: provide info - laboratory info/details - who recorded, date of spectrum, instrument, resolution, wavenumber range
[18]:
# can add to header - has been processed, and % of negative/nan cleaned from deleted/averaged
[ ]: